Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps) by ehildenb · Pull Request #4153 · runtimeverification/haskell-backend

ehildenb · 2026-06-10T18:13:21Z

This PR generalizes booster's equation evaluation strategy. Today, when an equation rewrites a subterm during the bottom-up simplification traversal, re-normalization happens by restarting whole-term passes from the top, so a rewrite chain advances one step per pass. This PR adds a per-pass budget, --equation-max-local-steps N: after a rewrite, the result is re-simplified in place (LLVM evaluation plus re-entering the bottom-up traversal) for up to N steps per pass before deferring back to the global restart loop. N=0 — the default — is exactly the existing restart-only behavior as the degenerate case of the same code path, so this changes nothing out of the box; the knob lets users tune how deep in-place simplification chains may run.

Changes:

New server option --equation-max-local-steps (default 0, identical to current behavior); the budget resets every global pass.
Booster.Pattern.ApplyEquations: on a rewrite in the bottom-up pass, localFixpointEval re-enters the cached traversal in place, counting re-entries against the budget.
Termination stays fully bounded: at most --equation-max-iterations passes times (budget + 1) applications per chain, with two-layer loop detection (per-step chain check inside the budget, whole-term snapshots across passes, which catch cycles longer than the budget).
Unit tests pin the default-0 boundary (identical to the old iteration-limit behavior) and the budgeted behaviors (deeper chains, per-step loop detection, combined bound) in an explicit options window.

Validation:

At the default, the unit suite and rpc-integration tests are unchanged, including the test-log-simplify-json log-trace golden — the default is trace-identical to master.
KEVM dose-response sweep (15 biggest-moving specs, N ∈ {0,1,2,3,4,5,20,100,1000}, all specs passing in every configuration): budgets 1–3 give about −30% total wall-clock (1530 s at N=0 → 1042 s at N=2), with per-spec wins of −30…−50% saturating at N=1–2.
The sweep also shows why this is a knob and not yet a new default: at N≥4 one spec family (ecrecoverloop02) regresses +40–50%, because an in-place chain completes a second nested memory write before the restart-pass concat rules flatten the first, and booster hands the resulting shape to Kore (Kore-side equation applications jump 2 980 → 26 477). The regression is an indicator of missing simplification lemmas / preserves-definedness annotations, not of the strategy.
Re-running the sweep with an extended lemma set that fills those gaps removes the cliff entirely: improvement becomes monotone in the budget, with the best total at N=20 (−35% vs N=0). Once the lemma set keeps simplification in-house, re-simplifying at the same subterm until done is a net performance win. This is the extended lemmas set: Simplification lemmas to improve Haskell booster coverage evm-semantics#2859. On that PR, without this change KEVM has some proofs that have major regressions. Coupled with this PR, and setting N=20, we get only performance wins there.

…attern/ApplyEquations: local-fixpoint evaluation behind --equation-local-fixpoint iterateEquations restarts the full term traversal whenever any node changes, and within a pass a successful application's RHS is not re-evaluated at its position. Each firing therefore triggers a whole-term sweep in which every ancestor of the changed node is a novel term and parents match against unevaluated RHS redexes — a sweep multiplier on equation attempts that grows with the length of causal rewrite chains. Behind the new --equation-local-fixpoint flag (default off, for A/B measurement), the BottomUp equation pass wraps its evaluation hook: when a node is rewritten, run the LLVM pass on the result (preserving the LLVM-before-equations ordering of the global loop) and re-enter the cached recursion on it, normalizing the new subtree in place. Ancestors then see children in final form and the global loop converges in 1-2 passes instead of one per causal chain step. Whole-term snapshots no longer catch node-level oscillations, so the chain of local rewrites along the current path is tracked in a path-scoped localSteps stack (saved/restored around each local recursion, reset for nested predicate evaluation) and checked per step, throwing EquationLoop; maxIterations keeps counting global passes only, so TooManyIterations semantics and its partial result are unchanged. traverseTerm itself is untouched (it is shared with the LLVM and path-condition passes). Validation: with the flag on, a causal chain of depth 101 that exceeds the global-pass limit today normalizes in a single pass, the loopDef oscillation is still detected as EquationLoop, and standard scenarios give identical results. The flag-window test is sequenced after the iteration-limit test it would otherwise race with (the test binary runs tests in parallel). Full unit suite passes; test-retain-condition-cache passes with byte-identical output with --equation-local-fixpoint enabled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…attern/ApplyEquations: replace the local-fixpoint flag with a per-pass budget, on by default (--equation-max-local-steps) The Bool flag forked evaluation into two strategies and left the in-place recursion unbounded (a single pass could apply arbitrarily many equations, with loop detection as the only brake). Replace it with a budget: at most maxLocalSteps in-place rewrites per traversal pass (--equation-max-local-steps, default 20, chosen on the expectation that a few applications normalize a side condition to its final truth value). On exhaustion, rewritten nodes are returned without recursion — exactly the restart-only strategy — and the global loop picks them up on the next pass. Restart-only evaluation is therefore the budget-0 special case of a single code path rather than a separate mode, the new strategy is the default, and total work is again bounded by maxIterations passes times (budget plus one application per node), with TooManyIterations and its partial result reached as before. Loop detection becomes two-layered: the path-scoped per-step check catches oscillations shorter than the budget immediately, and cycles that survive a pass boundary recur in the existing whole-term snapshots within at most one cycle period of passes (a period dividing the budget repeats the snapshot on the very next pass). Validation: bottom-up chains beyond the old one-application-per-pass limit complete with defaults; a lowered pass limit with the default budget still aborts an over-long chain with a partial result; budget 0 reproduces the old depth-100/101 success/abort boundary exactly; oscillations are detected per local step with the budget on and by whole-term snapshots with it off. Full unit suite passes; test-retain-condition-cache produces byte-identical output with the default budget. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

… binding, ambiguous record updates) The --pedantic CI build (-Werror) rejected the leftover isTooManyIterations helper, unused since the iteration-limit test moved into the budget-0 window, and the EquationOptions record updates, which are ambiguous under DuplicateRecordFields because the field names are shared with EquationConfig. Construct the options explicitly instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…log golden for in-place evaluation In-place evaluation attempts the INT.lt hook directly at the rewritten constraint subterm (logged failure on the symbolic argument) and the later simplify pass no longer re-attempts it (cache hit). Log-trace changes only; all RPC responses are unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…efault --equation-max-local-steps to 0 (restart-only) Make the in-place budget opt-in so the default behavior is identical to the existing restart-only evaluation strategy. The unit tests now pin the legacy iteration-limit boundary at the default and exercise the in-place behaviors (deeper chains, per-step loop detection, combined bound) in an explicit budget-20 window. The test-log-simplify-json golden reverts to the upstream trace, guarding that the default changes nothing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…tart loop and the --equation-max-local-steps budget The booster description's equation section now explains the restart-from-top pass structure that was previously implicit (and described as hard-coded), and how the in-place budget generalises it. Also fixes a typo (traverser) in the strategy bullets. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

jberthold

Results look quite good 🎉
but I would probably change a few details here, not least to make it easier to understand the code and the algorithm it implements.

jberthold · 2026-06-11T11:15:21Z

+        , localSteps = []
+        , localStepCount = 0


Do we have the invariant that length localSteps == localStepCount ?
If that is true then we might not need a separate counter.. but it might be good to have it for large maxLocalSteps instead of computing length many times.
~~I did not find a place where the localSteps are reset in the iteration, only the counter is reset.~~

EDIT: I found where the stack is popped, and the localStepCount is different from what I thought. I'd make it per level to get the mentioned invariant and make it easier to see what is going on.

Done: 3def8ed — the counter is now restored together with the stack around the recursion, so localStepCount == length localSteps is a state invariant; the counter only exists to avoid computing length per step.

jberthold · 2026-06-11T11:20:48Z

+            -- each pass gets a fresh in-place rewriting budget
+            eqState . modify $ \s -> s{localStepCount = 0}


First I was a bit confused here because
a) only the counter is reset here, not the localSteps stack
b) should the counter not be reset at every level of the tree?

EDIT: After commenting on the localFixPointEval I understand that the counter may be counting evaluations of the entire tree during one iteration. I'd make it per level, though.

Done: 3def8ed — the per-pass reset is gone; the counter is path-scoped alongside localSteps (per level, as you suggested), so the budget bounds each in-place chain's depth.

jberthold · 2026-06-11T11:29:22Z

+                                { localSteps = t' : s.localSteps
+                                , localStepCount = s.localStepCount + 1
+                                }
+                        result <- llvmSimplify t' >>= recurse


Is the idea here to repeat the evaluation of t at this level of the AST ? Or all levels from the bottom up up to this level again?
The code is currently doing the latter because above, simp is passed as the recurse parameter.

All levels, and that is deliberate: the rewrite builds a new subterm from the rule's RHS whose arguments need full evaluation before equations at this level (or above) can match decisively. The descent is cut short by the equation cache / isEvaluated at the substituted subject parts, so after the first step of a chain the re-entry costs one cache lookup per argument. Documented in the comment on localFixpointEval in 3def8ed.

The examples I looked at would not benefit from additional simplification at just the top-level, they require stepping into the arguments of the functions that are produced on the RHS, evaluating those, and then evaluating the top-level again.

jberthold · 2026-06-11T11:33:06Z

+                let onEval
+                        | direction == BottomUp = localFixpointEval simp


As commented in localFixpointEval, passing simp here (with BottomUp direction) means that the entire subtree from a node downwards is evaluated repeatedly. Maybe this is cut short because of an isEvaluated flag (see cached definition) so it is quick, but still a traversal.
The intention was to repeat evaluation of the current node in the AST, not re-evaluate all nodes below, so this should probably just use onEval itself for the recursion.

Suggested change

let onEval

| direction == BottomUp = localFixpointEval simp

let onEval

| direction == BottomUp = localFixpointEval onEval

We considered this but kept the full re-entry: re-applying at the node only would leave the RHS's new argument redexes unevaluated this pass, making matches at the node (and at ancestors) indeterminate — for function equations that is an abort, not a skip. The cache already prevents re-traversal of unchanged parts. Rationale added to the localFixpointEval comment in 3def8ed.

jberthold · 2026-06-11T11:47:46Z

+                        eqState . modify $ \s ->
+                            s
+                                { localSteps = t' : s.localSteps
+                                , localStepCount = s.localStepCount + 1
+                                }


Since st has been captured, the code does not have to use modify here. Also, the counter could be pre-evaluated to avoid a thunk (but it will be forced in each iteration anyway).

Suggested change

eqState . modify $ \s ->

s

{ localSteps = t' : s.localSteps

, localStepCount = s.localStepCount + 1

}

let !newCount = st.localStepCount + 1

eqState $ put

st

{ localSteps = t' : st.localSteps

, localStepCount = newCount

}

Done: 3def8ed

jberthold · 2026-06-11T11:48:24Z

+                                , localStepCount = s.localStepCount + 1
+                                }
+                        result <- llvmSimplify t' >>= recurse
+                        eqState . modify $ \s -> s{localSteps = st.localSteps}


This line effectively pops the localSteps stack but does not reset the counter.
A subsequent call to evaluate an AST node upward of this one will inherit the counter. I think the counter could be reset to st.localStepCount after recurse as well, then it would be st.localStepCount == length st.localSteps as a state invariant.

Done: 3def8ed — the counter is reset to st.localStepCount after recurse, establishing the invariant.

…xpoint

…cope the in-place budget (localStepCount == length localSteps) Per review: restore the counter together with the chain around the recursion, making the budget bound each in-place chain's depth rather than the total per pass, with localStepCount == length localSteps as a state invariant (the counter exists only to avoid computing the length per step). The per-pass reset in the global loop becomes redundant and is removed. Also use put on the captured state with a pre-forced counter instead of modify, and align the option help and docs with the per-chain budget semantics. The in-place recursion deliberately re-enters the full bottom-up traversal of the rewritten result (rather than re-applying at the rewritten node only): the rewrite builds a new subterm from the rule's RHS whose arguments need full evaluation, and the equation cache cuts the descent short at already-normal substituted subject parts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ehildenb and others added 5 commits June 10, 2026 17:39

ehildenb changed the title ~~Local-fixpoint equation evaluation~~ Booster: tunable in-place equation evaluation at rewritten subterms (--equation-max-local-steps) Jun 10, 2026

ehildenb marked this pull request as ready for review June 10, 2026 23:41

ehildenb requested a review from jberthold June 10, 2026 23:41

ehildenb mentioned this pull request Jun 10, 2026

Simplification lemmas to improve Haskell booster coverage runtimeverification/evm-semantics#2859

Merged

jberthold approved these changes Jun 11, 2026

View reviewed changes

ehildenb and others added 2 commits June 11, 2026 12:58

Merge remote-tracking branch 'upstream/master' into equation-local-fi…

99da566

…xpoint

ehildenb added the automerge label Jun 11, 2026

ehildenb merged commit 5679e50 into master Jun 11, 2026
6 checks passed

ehildenb deleted the equation-local-fixpoint branch June 11, 2026 16:28

ehildenb mentioned this pull request Jun 12, 2026

Update dependency: deps/kevm_release runtimeverification/kontrol#1153

Open

		-- each pass gets a fresh in-place rewriting budget
		eqState . modify $ \s -> s{localStepCount = 0}

-                        eqState . modify $ \s ->
-                            s
-                                { localSteps = t' : s.localSteps
-                                , localStepCount = s.localStepCount + 1
-                                }
+                        let !newCount = st.localStepCount + 1
+                        eqState $ put
+                            st
+                                { localSteps = t' : st.localSteps
+                                , localStepCount = newCount
+                                }

Conversation

ehildenb commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jberthold left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ehildenb Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehildenb commented Jun 10, 2026 •

edited

Loading

ehildenb Jun 11, 2026 •

edited

Loading